Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon, Fuzzy Tag-Set Mapping, and EM-HMM-Based Lexical Probabilities
نویسندگان
چکیده
We present a framework for interfacing a PCFG parser with lexical information from an external resource following a different tagging scheme than the treebank. This is achieved by defining a stochastic mapping layer between the two resources. Lexical probabilities for rare events are estimated in a semi-supervised manner from a lexicon and large unannotated corpora. We show that this solution greatly enhances the performance of an unlexicalized Hebrew PCFG parser, resulting in state-of-the-art Hebrew parsing results both when a segmentation oracle is assumed, and in a real-word parsing scenario of parsing unsegmented tokens.
منابع مشابه
Integrating Probabilistic and Knowledge-based Approaches to Corpus Parsing
We have developed a prototype system for syntactic parsing of corpus text based on a wide-coverage unification-based grammar of English and domain-independent statistical techniques for selecting the most plausible parses from the typically large number licensed by the grammar. Although the results from initial experiments are promising, the system is ‘brittle’, relying particularly on the corr...
متن کاملLexicalized Beam Thresholding Parsing with Prior and Boundary Estimates
We use prior and boundary estimates as the approximation of outside probability and establish our beam thresholding strategies based on these estimates. Lexical items, e.g. head word and head tag, are also incorporated to lexicalized prior and boundary estimates. Experiments on the Penn Chinese Treebank show that beam thresholding with lexicalized prior works much better than that with unlexica...
متن کاملAutomatically Extending the Lexicon for Parsing
This paper describes a method for automatically extending the lexicon of wide-coverage parsers. The method is an extension to the automatic detection of coverage problems of natural language parsers, based on large amounts of raw text (van Noord 2004). The goal is to extend grammar coverage, focusing in particular on the acquisition of lexical information for missing and incomplete lexicon entr...
متن کاملA Large-scale Inheritance-based Morphological Lexicon for Russian
In this paper we describe the mapping of Zaliznjak’s (1977) morphological classes into the lexical representation language DATR (Evans and Gazdar 1996). On the basis of the resulting DATR theory a set of fully inflected forms together with their associated morphosyntax can automatically be generated from the electronic version of Zaliznjak’s dictionary (Ilola and Mustajoki 1989). From this data...
متن کاملITRI-03-02 A large-scale inheritance-based morphological lexicon for Russian
In this paper we describe the mapping of Zaliznjak’s (1977) morphological classes into the lexical representation language DATR (Evans and Gazdar 1996). On the basis of the resulting DATR theory a set of fully inflected forms together with their associated morphosyntax can automatically be generated from the electronic version of Zaliznjak’s dictionary (Ilola and Mustajoki 1989). From this data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009